Processing LexisNexus results


In [2]:
# import the modules we need
import os
import re

In [4]:
os.listdir('data')


Out[4]:
['LexisNexis practice.TXT', 'threebears.txt']

In [5]:
data = open('data/LexisNexis practice.TXT').read()

In [10]:
# 5 of 54 DOCUMENTS
data.count('of 54 DOCUMENTS')


Out[10]:
4

In [17]:
'This is a string of words'.split('i')


Out[17]:
['Th', 's ', 's a str', 'ng of words']

In [11]:
docs = data.split('of 54 DOCUMENTS')

In [12]:
len(docs)


Out[12]:
5

In [24]:
for dnum in [1,2,3,4]:
    print('This is doc number {}'.format(dnum))


This is doc number 1
This is doc number 2
This is doc number 3
This is doc number 4

In [13]:
docs[0]


Out[13]:
'\ufeff\n                           FOCUS - 1 '

In [27]:
data = open('data/LexisNexis practice.TXT').read()

docs = data.split('of 54 DOCUMENTS')

dnum=1
for doc in docs[1:]:
    # open a new file
    output = open('data/doc{}.txt'.format(dnum), 'w')
    # write the document to the file
    output.write(doc)
    dnum=dnum+1

In [33]:
print(docs[3])




                               The New York Times

                             January 5, 2015 Monday
                              Late Edition - Final

The Central Crisis in New York Education

BYLINE: By THE EDITORIAL BOARD

SECTION: Section A; Column 0; Editorial Desk; EDITORIAL; Pg. 16

LENGTH: 625 words


Gov. Andrew Cuomo's forthcoming State of the State address is expected to focus
on what can be done to improve public education across the state.

If he is serious about the issue, he will have to move beyond peripheral
concerns and political score-settling with the state teachers' union, which did
not support his re-election, and go to the heart of the matter. And that means
confronting and proposing remedies for the racial and economic segregation that
has gripped the state's schools, as well as the inequality in school funding
that prevents many poor districts from lifting their children up to state
standards.

These shameful inequities were fully brought to light in 2006, when the state's
highest court ruled in Campaign for Fiscal Equity v. State of New York that the
state had not met its constitutional responsibility to ensure adequate school
funding and in particular had shortchanged New York City.

A year later, the Legislature and Gov. Eliot Spitzer adopted a new formula that
promised more help for poor districts and eventually $7 billion per year in
added funding. That promise evaporated in the recession, spawning two lawsuits
aimed at forcing the state to honor it.

A lawsuit by a group called New Yorkers for Students' Educational Rights
estimates that, despite increases in recent years, the state is still about $5.6
billion a year short of its commitment under that formula.

A second lawsuit was filed on behalf of students in several small cities in the
state, including Jamestown, Port Jervis, Mount Vernon and Newburgh. It says that
per pupil funding in the cities, which have an average 72 percent student
poverty rate, is $2,500 to $6,300 less than called for in the 2007 formula,
making it impossible to provide the instruction other services needed to meet
the State Constitution's definition of a ''sound basic education.''

These communities and others like them are further disadvantaged by having low
property values and by a statewide cap enacted in 2011 that limits what money
they are able to raise through property taxes. And last year the New York State
United Teachers union said that the cap had been particularly harmful to poorer
districts.

These inequalities are compounded by the fact that New York State, which regards
itself as a bastion of liberalism, has the most racially and economically
segregated schools in the nation. A scathing 2014 study of this problem by the
Civil Rights Project at the University of California, Los Angeles, charged that
New York had essentially given up on this problem. It said, ''The children who
most depend on the public schools for any chance in life are concentrated in
schools struggling with all the dimensions of family and neighborhood poverty
and isolation.''

The Cuomo administration seemed not to acknowledge these issues in a letter last
month to the chancellor of the New York State Board of Regents and the
commissioner of education in which it promised ''an aggressive legislative
package'' to improve education in the state. Among the dozen issues it said it
wanted to address were strengthening the teacher evaluation system, improving
the process for removing low-performing teachers and improving teacher training.

The regents agreed that these were legitimate issues needing attention. But they
also noted that these reforms were unlikely to improve the schools unless they
were paired with new investments along the lines of the $2 billion in extra
spending that the regents had recommended earlier. No less pointedly, they urged
Mr. Cuomo to address the ''deeply disturbing inequalities in resources'' that
exist between poor and wealthy districts, as well as the destructive pattern of
segregation. Mr. Cuomo must take on both of these central issues.

URL:
http://www.nytimes.com/2015/01/05/opinion/the-central-crisis-in-new-york-educati
on.html

LANGUAGE: ENGLISH

DOCUMENT-TYPE: Editorial

PUBLICATION-TYPE: Newspaper

SUBJECT: SUITS & CLAIMS (90%); EDUCATION FUNDING (90%); LITIGATION (90%);
EDUCATION SYSTEMS & INSTITUTIONS (90%); EDITORIALS & OPINIONS (90%); TEACHING &
TEACHERS (89%); STUDENTS & STUDENT LIFE (89%); PUBLIC SCHOOLS (89%); US STATE
GOVERNMENT (79%); ELECTIONS (78%); LEGISLATIVE BODIES (78%); SCHOOL
DESEGREGATION (78%); DISCRIMINATION IN EDUCATION (78%); RIGHT TO EDUCATION
(78%); CIVIL RIGHTS (76%); RACE & RACISM (76%); REAL ESTATE (76%); POOR
POPULATION (75%); POVERTY & HOMELESSNESS (75%); LIBERALISM (72%); TEACHER UNIONS
(72%); DECISIONS & RULINGS (69%); SETTLEMENTS & DECISIONS (69%); SUPREME COURTS
(69%); POVERTY RATES (68%); PROPERTY TAX (61%); REAL ESTATE VALUATIONS (61%);
APPEALS (54%)

PERSON: ANDREW CUOMO (79%); ELIOT SPITZER (59%)

CITY: NEW YORK, NY, USA (92%); LOS ANGELES, CA, USA (79%)

STATE: NEW YORK, USA (96%); CALIFORNIA, USA (79%)

COUNTRY: UNITED STATES (96%)

LOAD-DATE: January 5, 2015


                   Copyright 2015 The New York Times Company


                           FOCUS - 5 

In [39]:
for doc in docs[1:]:
    print(doc.count('LANGUAGE:'))


1
1
1
1

In [40]:
# body is the text between LENGTH: number and LANGUAGE:

In [45]:
start = docs[1].find('LENGTH:')
end = docs[1].find('LANGUAGE:')

body = docs[1][start:end]

In [52]:
doc_body
for doc in docs[1:]:
    start = doc.find('LENGTH:')
    end = doc.find('LANGUAGE:')
    doc_body.append(doc[start:end])

In [50]:
for doc in docs[1:]:
    start = doc.find('LENGTH:')
    end = doc.find('LANGUAGE:')
    pre_body = doc[:start]
    body = doc[start:end]
    post_body = doc[end:]


Out[50]:
4

In [53]:
import csv

In [55]:
a=[1,2,3,4,'abds','adsds']  # list

In [56]:
a[0]


Out[56]:
1

In [57]:
a[4]


Out[57]:
'abds'

In [58]:
a[2:5]


Out[58]:
[3, 4, 'abds']

Dictionary

name : value - pairs

key : value - pair


In [61]:
d1 = { 'item1': 'This is item 1', 'item2': 'This is item 2'}

In [62]:
d1


Out[62]:
{'item1': 'This is item 1', 'item2': 'This is item 2'}

In [63]:
d1['item2']


Out[63]:
'This is item 2'

In [64]:
d1['item1']


Out[64]:
'This is item 1'

In [65]:
d1['item3'] = a

In [66]:
d1


Out[66]:
{'item1': 'This is item 1',
 'item2': 'This is item 2',
 'item3': [1, 2, 3, 4, 'abds', 'adsds']}

In [67]:
d1['item3']


Out[67]:
[1, 2, 3, 4, 'abds', 'adsds']

In [68]:
d1['item3'][2]


Out[68]:
3

Spreadsheet structure

       col1 col2 col3
row1     1   t    2
row2     2   x    5
row3     3   a    5

In [69]:
data = [
   { 'col1': 1, 'col2': 't', 'col3': 2},   
    { 'col1': 2, 'col2': 'x', 'col3': 5},
    { 'col1': 3, 'col2': 'a', 'col3': 5}
]

In [75]:
data


Out[75]:
[{'col1': 1, 'col2': 't', 'col3': 2},
 {'col1': 2, 'col2': 'x', 'col3': 5},
 {'col1': 3, 'col2': 'a', 'col3': 5}]

In [78]:
with open('data/test.csv', 'w') as outfile:
    out = csv.DictWriter(outfile, 
                         fieldnames=['col1','col2','col3'])
    out.writeheader()
    out.writerows(data)

In [74]:
open('data/test.txt','w')


Out[74]:
<_io.TextIOWrapper name='data/test.txt' mode='w' encoding='UTF-8'>

In [81]:
fh = open('data/test.csv')
for row in csv.DictReader(fh):
    print(row)


{'col2': 't', 'col3': '2', 'col1': '1'}
{'col2': 'x', 'col3': '5', 'col1': '2'}
{'col2': 'a', 'col3': '5', 'col1': '3'}

In [82]:
csv_data = [r for r in csv.DictReader(open('data/test.csv'))]

In [83]:
csv_data


Out[83]:
[{'col1': '1', 'col2': 't', 'col3': '2'},
 {'col1': '2', 'col2': 'x', 'col3': '5'},
 {'col1': '3', 'col2': 'a', 'col3': '5'}]

In [84]:
csv_data[0]


Out[84]:
{'col1': '1', 'col2': 't', 'col3': '2'}

In [85]:
csv_data[1]['col2']


Out[85]:
'x'

In [86]:
doc_data = []

for doc in docs[1:]:
    start = doc.find('LENGTH:')
    end = doc.find('LANGUAGE:')
    pre_body = doc[:start]
    body = doc[start:end]
    post_body = doc[end:]
    
    row_dict = { 'pre_body': pre_body, 
                 'body': body,
                 'post_body': post_body }
    
    doc_data.append(row_dict)

In [92]:
print(doc_data[2]['pre_body'])




                               The New York Times

                             January 5, 2015 Monday
                              Late Edition - Final

The Central Crisis in New York Education

BYLINE: By THE EDITORIAL BOARD

SECTION: Section A; Column 0; Editorial Desk; EDITORIAL; Pg. 16



In [94]:
with open('data/docs.csv', 'w') as outfile:
    out = csv.DictWriter(outfile, 
                         fieldnames=['pre_body','body','post_body'])
    out.writeheader()
    out.writerows(doc_data)

Some regular expression matching


In [108]:
re.findall('[A-Z]+:',doc, re.MULTILINE)


Out[108]:
['BYLINE:',
 'SECTION:',
 'LENGTH:',
 'URL:',
 'GRAPHIC:',
 'CHART:',
 'LANGUAGE:',
 'TYPE:',
 'TYPE:',
 'SUBJECT:',
 'COMPANY:',
 'ORGANIZATION:',
 'STATE:',
 'COUNTRY:',
 'DATE:']

In [111]:
print('\n\n=====\n\n'.join(doc.split('\n\n')))



=====


                               The New York Times

=====

                             January 4, 2015 Sunday
                              Late Edition - Final

=====

Is Life Better in America's Red States?

=====

BYLINE: By RICHARD FLORIDA.

=====

The director of the Martin Prosperity Institute at the Rotman School of
Management, University of Toronto, and a founder of The Atlantic's CityLab.

=====

SECTION: Section SR; Column 0; Sunday Review Desk; OPINION; Pg. 6

=====

LENGTH: 1196 words

=====


THE new Congress that starts work this week is the latest reminder of America's
stark political divisions: The parties in Washington are more polarized than
they have been in decades, the partisanship gap between rural Republicans and
urban Democrats has grown, and the battle for suburban voters keeps
intensifying. Much less is said, however, about the equally significant economic
division between conservative ''red states'' and liberal ''blue states.''

=====

Blue states, like California, New York and Illinois, whose economies turn on
finance, trade and knowledge, are generally richer than red states. But red
states, like Texas, Georgia and Utah, have done a better job over all of
offering a higher standard of living relative to housing costs. That basic
economic fact not only helps explain why the nation's electoral map got so much
redder in the November midterm elections, but also why America's prosperity is
in jeopardy.

=====

Red state economies based on energy extraction, agriculture and suburban sprawl
may have lower wages, higher poverty rates and lower levels of education on
average than those of blue states -- but their residents also benefit from much
lower costs of living. For a middle-class person , the American dream of a big
house with a backyard and a couple of cars is much more achievable in low-tax
Arizona than in deep-blue Massachusetts. As Jed Kolko, chief economist of
Trulia, recently noted, housing costs almost twice as much in deep-blue markets
($227 per square foot) than in red markets ($119).

=====

Driven by oil, the fracking boom and exurban sprawl, many of the red state
economies are experiencing a vigorous (if ultimately unsustainable) spurt of
growth. Thanks to loose land-use regulations and low labor costs, detached,
single-family homes can be churned out quite cheaply, generating more
middle-wage, low-skill jobs. And since red states spend less per capita on
education, infrastructure and social welfare than their blue state counterparts
(and many of them receive more federal dollars than they contribute), their tax
burdens are lower, too.

=====

To the surprise of many, voters in four red states -- Alaska, Nebraska, South
Dakota and Arkansas -- supported referendums in November to raise their state
minimum wage. And not just by a little. Controlling for the cost of living, they
will have wage floors that are higher than those of many blue states. Once
Obamacare is factored in, voters in these states ironically benefit from a
somewhat strengthened social safety net, even though it is one that their
elected politicians mainly oppose and that is heavily subsidized by blue state
tax dollars.

=====

For blue state urbanites who toil in low-paying retail, food preparation and
service jobs, for the journeyman tradespeople who once formed the heart of the
middle class, for teachers, civil servants, students and young families, the
American dream of homeownership -- or even an affordable rental apartment -- is
increasingly out of reach. Adding insult to injury, rapid gentrification in
these larger knowledge hubs brings the constant threat of displacement of
creative workers. For even the much better paid techies, engineers, financiers
and managers who are displacing them, the metropolitan version of the American
dream is a cramped condo or a small house and a long commute. Many are opting to
move to cheaper red states instead, further driving their growth.

=====

Inequality has grown fastest over the past three decades in larger states with
more vibrant knowledge economies, like Massachusetts, New York, New Jersey and
Connecticut. In 1979, the most unequal states were poor conservative states --
Mississippi, Louisiana, Arkansas, Alabama and Georgia. By 2012, New York,
Connecticut, California and Massachusetts joined Mississippi, Louisiana,
Florida, Georgia, Texas and Tennessee among the 10 most unequal states.

=====

Blue state knowledge economies are also extremely expensive to operate. Their
innovative edge turns on a high-cost infrastructure of research universities and
knowledge institutions -- a portion of which demand public subsidy. Their size
and density require expensive subway and transit systems to move people around.
Blue state cities like New York and San Francisco are booming, but they are
hampered by potholes and crumbling infrastructure, troubled public school
systems, growing inequality and housing unaffordability, and entrenched poor
populations, all of which mean higher public costs and higher tax burdens.

=====

And yet for all that, they are pioneering the new economic order that will
determine our future -- one that turns on innovation and knowledge rather than
the raw production of goods.

=====

Despite their longstanding divisions, red state and blue state economies depend
crucially on one another. Just as Alexander Hamilton's merchant cities ate and
exported the harvests of Thomas Jefferson's yeomen farmers, and New England
textile mills wove slave-harvested cotton, blue state knowledge economies run on
red state energy. Red state energy economies in their turn depend on dense
coastal cities and metro areas, not just as markets and sources of migrants, but
for the technology and talent they supply.

=====

Of course, while Massachusetts and Mississippi represent the extremes of
America's politico-economic divide, there are many red states like Utah, Arizona
and Texas that are growing their tech and knowledge economies, and a number of
historically blue states like Pennsylvania that have benefited from the fracking
boom. But in our increasingly competitive global economy, long-term prosperity
turns on knowledge, education and innovation. The idea that the red states can
enjoy the benefits provided by the blue states without helping to pay for them
(and while poaching their industries with the promise of low taxes and
regulations) is as irresponsible and destructive of our national future as it is
hypocritical.

=====

But that is exactly the mantra of the growing ranks of red state politicos. Gov.
Rick Perry of Texas, a likely 2016 G.O.P. presidential candidate, has taken to
bragging that his state's low-frills development strategy provides a model for
the nation as a whole. But fracking and sprawling your way to growth aren't a
sustainable national economic strategy.

=====

The allure of cheap growth has handed the red states a distinct political
advantage. Their economic system may be outmoded and obsolete, but it is strong
enough to blight the future. The Democrats may be able to draw on the country's
growing demographic diversity and the liberal leanings of younger voters to win
the presidency from time to time, but the real power dynamic is red.

=====

As long as the highly gerrymandered red states can keep on delivering the
economic goods to their voters, concerted federal action on transportation,
infrastructure, sustainability, education, a rational immigration policy and a
strengthened social safety net will remain out of reach. These are investments
that the future prosperity of the nation, in red states and blue states alike,
requires.

=====

Heightened partisan rancor is the least of our problems. The red state-blue
state divide threatens to kill the real American dream.

=====



=====


URL:
http://www.nytimes.com/2015/01/04/opinion/sunday/is-life-better-in-americas-red-
states.html

=====

GRAPHIC: CHART: Where Housing Is Cheapest: Of the most affordable of the 100
largest markets, almost all are politically red, or rust-belt cities that are
politically blue. (Sources: Jed Kolko, chief economist,Trulia (affordability
analysis)
  Office of Management  and Budget (metropolitan  divisions))

=====

LANGUAGE: ENGLISH

=====

DOCUMENT-TYPE: Op-Ed

=====

PUBLICATION-TYPE: Newspaper

=====

SUBJECT: POLITICAL PARTIES (90%); SUBURBS (89%); COST OF LIVING (89%); VOTERS &
VOTING (89%); TAXES & TAXATION (88%); WAGES & SALARIES (88%); ELECTIONS (78%);
US REPUBLICAN PARTY (78%); CAMPAIGNS & ELECTIONS (78%); CONSERVATISM (78%);
MIDTERM ELECTIONS (78%); POLITICS (78%); US DEMOCRATIC PARTY (78%); LIBERALISM
(78%); LAND USE PLANNING (77%); POOR POPULATION (77%); LIVING STANDARDS (76%);
US STATE GOVERNMENT (75%); SERVICE WORKERS (74%); MINIMUM WAGE (74%);
EDUCATIONAL INSTITUTION EMPLOYEES (74%); MIDDLE INCOME PERSONS (72%);
REFERENDUMS (72%); ELECTORAL DISTRICTS (72%); HYDRAULIC FRACTURING (72%); REAL
PROPERTY LAW (69%); CIVIL SERVICES (67%); OIL EXTRACTION (67%); POVERTY &
HOMELESSNESS (67%); TEACHING & TEACHERS (60%); POVERTY RATES (52%)

=====

COMPANY: TRULIA INC (56%)

=====

ORGANIZATION: OFFICE OF MANAGEMENT & BUDGET (59%)

=====

STATE: NEW YORK, USA (79%); TEXAS, USA (79%); CALIFORNIA, USA (79%); ILLINOIS,
USA (79%)

=====

COUNTRY: UNITED STATES (97%)

=====

LOAD-DATE: February 3, 2015

=====


                   Copyright 2015 The New York Times Company


In [ ]: